13 分布式架构经典图书和论文

经典图书

Distributed Systems for fun and profit 以一种更易于理解的方式，讲述以亚马逊的 Dynamo、谷歌的 Bigtable 和 MapReduce 等为代表的分布式系统背后的核心思想
Designing Data Intensive Applications 本书深入浅出地用很多的工程案例讲解了如何让数据结点做扩展
Distributed Systems: Principles and Paradigms 介绍了分布式系统的七大核心原理，并给出了大量的例子
Scalable Web Architecture and Distributed Systems 主要针对面向互联网（公网）的分布式系统
Principles of Distributed Systems 讲述了多种分布式系统中会用到的算法

经典论文

分布式事务

《Transaction Across DataCenter》（YouTube 视频）Google I/O 大会上的演讲

Paxos 一致性算法

一种基于消息传递且具有高度容错特性的一致性算法

Bigtable: A Distributed Storage System for Structured Data
The Chubby lock service for loosely-coupled distributed systems
The Google File System
MapReduce: Simplified Data Processing on Large Clusters
Neat Algorithms - Paxos

Raft 一致性算法

In search of an Understandable Consensus Algorithm (Extended Version)
Raft - The Secret Lives of Data
Raft Consensus Algorithm
Raft Distributed Consensus Algorithm Visualization

Gossip 一致性算法

Dynamo: Amazon’s Highly Available Key Value Store 讲述 Amazon 的 DynamoDB 是如何满足系统的高可用、高扩展和高可靠的
Time, Clocks and the Ordering of Events in a Distributed System 主要解决分布式系统中的时钟同步问题
马萨诸塞大学课程 Distributed Operating System 中第 10 节 Clock Synchronization 讲述了时钟同步的问题
Why Vector Clocks are Easy 和 Why Vector Clocks are Hard Vector Clock相关
Efficient Reconciliation and Flow Control for Anti-Entropy Protocols 用来做数据同步的 Gossip 协议的原始论文
Understanding Gossip (Cassandra Internals) Gossip 协议也是 NoSQL 数据库 Cassandra 中使用到的数据协议
Gossip Visualization 关于 Gossip 的一些图示

分布式存储和数据库

Amazon Aurora: Design Considerations for High Throughput Cloud -Native Relation Databases
Spanner: Google’s Globally-Distributed Database
F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business
Cassandra: A Decentralized Structured Storage System
CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data

分布式消息系统

Kafka: a Distributed Messaging System for Log Processing
Wormhole: Reliable Pub-Sub to Support Geo-replicated Internet Services
All Aboard the Databus! LinkedIn’s Scalable Consistent Change Data Capture Platform

日志和数据

The Log: What every software engineer should know about real-time data’s unifying abstraction
The Log-Structured Merge-Tree (LSM-Tree)
Immutability Changes Everything
Tango: Distributed Data Structures over a Shared Log）

分布式监控和跟踪

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

数据分析

The Unified Logging Infrastructure for Data Analytics at Twitter Twitter 公司的一篇关于日志架构和数据分析的论文
Scaling Big Data Mining Infrastructure: The Twitter Experience 讲 Twitter 公司的数据分析平台是怎么做的
Dremel: Interactive Analysis of Web-Scale Datasets 介绍了Google 公司的 Dremel 的架构与实现，以及它与 MapReduce 是如何互补的
Resident Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing 论文提出了弹性分布式数据集（Resilient Distributed Dataset，RDD）的概念

与编程相关的论文

Distributed Programming Model
PSync: a partially synchronous language for fault-tolerant distributed algorithms
Programming Models for Distributed Computing
Logic and Lattices for Distributed Programming

其它的分布式论文阅读列表

Services Engineering Reading List
Readings in Distributed Systems
Google Research - Distributed Systems and Parallel Computing

经典图书​

经典论文​

分布式事务​

Paxos 一致性算法​

Raft 一致性算法​

Gossip 一致性算法​

分布式存储和数据库​

分布式消息系统​

日志和数据​

分布式监控和跟踪​

数据分析​

与编程相关的论文​

其它的分布式论文阅读列表​

经典图书

经典论文

分布式事务

Paxos 一致性算法

Raft 一致性算法

Gossip 一致性算法

分布式存储和数据库

分布式消息系统

日志和数据

分布式监控和跟踪

数据分析

与编程相关的论文

其它的分布式论文阅读列表